Abstract:Sparse structure learning of Bayesian networks can simplify network structure without losing important information of the original network structure. In this paper, the necessity of the sparse structure learning of Bayesian networks and the definition of the sparsity of those are firstly discussed. Based on the general structure learning of Bayesian networks, the existing problems for high-dimensional data are analyzed, and then it is found that score-based structure learning is suitable for sparse structure learning. Therefore, the objective functions and their optimization algorithms are mainly described. Finally, some meaningful research trends are discussed.
[1] CHICKERING D M. Learning Bayesian Networks Is NP-Complete // FISHER D, LENZ H J, eds. Learning from Data. New York, USA: Springer-Verlag, 1996, III: 121-130. [2] CHICKERING D M, MEEK C, HECKERMAN D. Large-Sample Lear-ning of Bayesian Networks Is NP-Hard. Journal of Machine Learning Research, 2003, 5: 1287-1330. [3] MEINSHAUSEN N, B HLMANN P. High-Dimensional Graphs and Variable Selection with the Lasso. The Annals of Statistics, 2006, 34(3): 1436-1462. [4] KALISCH M, B HLMANN P. Estimating High-Dimensional Direc-ted Acyclic Graphs with the PC-Algorithm. Journal of Machine Learning Research, 2007, 8: 613-636. [5] LAM C, FAN J Q. Sparsistency and Rates of Convergence in Large Covariance Matrix Estimation. The Annals of Statistics, 2009, 37(6B): 4254-4278. [6] VAN DE GEER S, B HLMANN P. l0-Penalized Maximum Likelihood for Sparse Directed Acyclic Graphs. The Annals of Statistics, 2013, 41(2): 536-567. [7] ARAGAM B, ZHOU Q. Concave Penalized Estimation of Sparse Gaussian Bayesian Networks. Journal of Machine Learning Research, 2015, 16: 2273-2328. [8] FRIEDMAN N, NACHMAN I, PER D. Learning Bayesian Network Structure from Massive Datasets: The `Sparse Candidate' Algorithm // Proc of the 15th Conference on Uncertainty in Artificial Intelligence. San Francisco, USA: Morgan Kaufmann Publishers, 1999: 206-215. [9] SHOJAIE A, MICHAILIDIS G. Penalized Likelihood Methods for Estimation of Sparse High-Dimensional Directed Acyclic Graphs. Biometrika, 2010, 97(3): 519-538. [10] FRIEDMAN N, LINIAL M, NACHMAN I, et al. Using Bayesian Networks to Analyze Expression Data. Journal of Computational Biology, 2000, 7(3/4): 601-620. [11] RODIN A S, BOERWINKLE E. Mining Genetic Epidemiology Data with Bayesian Networks I: Bayesian Networks and Example Application (Plasma apoE Level). Bioinformatics, 2005, 21(15): 3273-3278. [12] RAJAPAKSE J C, ZHOU J. Learning Effective Brain Connectivity with Dynamic Bayesian Networks. NeuroImage, 2007, 37(3): 749-760. [13] ZHOU L P, WANG L, LIU L Q, et al. Discriminative Brain Effective Connectivity Analysis for Alzheimer′s Disease: A Kernel Lear-ning Approach upon Sparse Gaussian Bayesian Network // Proc of the IEEE Conference on Computer Vision and Pattern Recognition. Washington, USA: IEEE, 2013: 2243-2250. [14] MARCOT B G, HOLTHAUSEN R S, RAPHAEL M, et al. Using Bayesian Belief Networks to Evaluate Fish and Wildlife Population Viability under Land Management Alternatives from an Environmental Impact Statement. Forest Ecology and Management, 2001, 153(1/2/3): 29-42. [15] BORSUK M E, STOW C A, RECKHOW K H. A Bayesian Network of Eutrophication Models for Synthesis, Prediction, and Uncertainty Analysis. Ecological Modelling, 2004, 173(2/3): 219-239. [16] DAI H H, KORB K, WALLACE C, et al. A Study of Causal Discovery with Weak Links and Small Samples // Proc of the 15th International Joint Conference on Artificial Intelligence. San Francisco, USA: Morgan Kaufmann Publishers, 1997: 1304-1309. [17] LIEBIG T, K RNER C, MAY M. Scalable Sparse Bayesian Network Learning for Spatial Applications // Proc of the IEEE International Conference on Data Mining Workshops. Washington, USA: IEEE, 2008: 420-425. [18] LI J, SHI J J. Knowledge Discovery from Observational Data for Process Control Using Causal Bayesian Networks. IIE Transactions, 2007, 39(6): 681-690. [19] PENG J, WANG P, ZHOU N F, et al. Partial Correlation Estimation by Joint Sparse Regression Models. Journal of the American Statistical Association, 2009, 104(486): 735-746. [20] SPORNS O, CHIALVO D R, KAISER M, et al. Organization, Development and Function of Complex Brain Networks. Trends in Cognitive Sciences, 2004, 8(9): 418-425. [21] HUANG S, LI J, YE J P, et al. A Sparse Structure Learning Algorithm for Gaussian Bayesian Network Identification from High-Dimensional Data. IEEE Trans on Pattern Analysis and Machine Intelligence, 2013, 35(6): 1328-1342. [22] TIBSHIRANI R. Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society(Statistical Methodological), 1996, 58(1): 267-288. [23] MARGARITIS D, THRUN S. Bayesian Network Induction via Local Neighborhoods // SOLLA S A, LEEN T K, M LLER K R, eds. Advances in Neural Information Processing Systems 12. Cambridge, USA: MIT Press, 2000: 505-511. [24] PELLET J P, ELISSEEFF A. Using Markov Blankets for Causal Structure Learning. Journal of Machine Learning Research, 2008, 9: 1295-1342. [25] DE CAMPOS L M. Independency Relationships and Learning Algorithms for Singly Connected Networks. Journal of Experimental & Theoretical Artificial Intelligence, 2010, 10(4): 511-549. [26] DE CAMPOS L M, HUETE J F. A New Approach for Learning Belief Networks Using Independence Criteria. International Journal of Approximate Reasoning, 2000, 24(1): 11-37. [27] VERMA T, PEARL J. Equivalence and Synthesis of Causal Models // Proc of the 6th Annual Conference on Uncertainty in Artificial Intelligence. New York, USA: Elsevier Science, 1990: 255-270. [28] SPIRTES P, GLYMOUR C, SCHEINES R. Causation, Prediction, and Search. New York, USA: Springer, 1993. [29] MEEK C. Causal Inference and Causal Explanation with Background Knowledge // Proc of the 11th Conference on Uncertainty in Artificial Intelligence. San Francisco, USA: Morgan Kaufmann Publisher, 1995: 403-410. [30] TSAMARDINOS I, ALIFERIS C F. Towards Principled Feature Selection: Relevancy, Filters and Wrappers[C/OL]. [2016-01-12]. https://www.researchgate.net/publication/2478803_Towards_Principled_Feature_Selection_Relevancy_Filters_and_Wrappers. [31] ALIFERIS C F, TSAMARDINOS I, STATNIKOV A. HITON: A Novel Markov Blanket Algorithm for Optimal Variable Selection[C/OL]. [2016-01-12]. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1480117/. [32] TSAMARDINOS I, BROWN L E, ALIFERIS C F. The Max-Min Hill-Climbing Bayesian Network Structure Learning Algorithm. Machine Learning, 2006, 65(1): 31-78. [33] COOPER G F, HERSKOVITS E. A Bayesian Method for the Induction of Probabilistic Networks from Data. Machine Learning, 1992, 9(4): 309-347. [34] HECKERMAN D, GEIGER D, CHICKERING D M. Learning Baye- sian Networks: The Combination of Knowledge and Statistical Data. Machine Learning, 1995, 20(3): 197-243. [35] BUNTINE W. A Guide to the Literature on Learning Probabilistic Networks from Data. IEEE Trans on Knowledge and Data Enginee-ring, 1996, 8(2): 195-210. [36] FRIEDMAN N, KOLLER D. Being Bayesian about Network Structure: A Bayesian Approach to Structure Discovery in Bayesian Networks. Machine Learning, 2003, 50(1): 95-125. [37] SCHWARZ G. Estimating the Dimension of a Model. The Annals of Statistics, 1978, 6(2): 461-464. [38] LAM W, BACCHUS F. Learning Bayesian Belief Networks: An Approach Based on the MDL Principle. Computational Intelligence, 1994, 10(3): 269-293. [39] SUZUKI J. A Construction of Bayesian Networks from Databases Based on an MDL Principle // Proc of the 9th International Conference on Uncertainty in Artificial Intelligence. San Francisco, USA: Morgan Kaufmann Publisher, 1993: 266-273. [40] FRIEDMAN N, GOLDSZMIDT M. Learning Bayesian Networks with Local Structure // Proc of the 20th International Conference on Uncertainty in Artificial Intelligence. San Francisco, USA: Morgan Kaufmann Publisher, 1999: 252-262. [41] CHOW C, LIU C. Approximating Discrete Probability Distributions with Dependence Trees. IEEE Trans on Information Theory, 1968, 14(3): 462-467. [42] HECKERMAN D. A Tutorial on Learning with Bayesian Networks // HOLMES D E, JAIN L C, eds. Innovations in Bayesian Networks. Berlin, Germany: Springer, 2008: 33-82. [43] GEIGER D, HECKERMAN D. Learning Gaussian Networks // Proc of the 10th International Conference on Uncertainty in Artificial Intelligence. San Francisco, USA: Morgan Kaufmann Publi-shers, 1994: 235-243. [44] SINGH A P, MOORE A W. Finding Optimal Bayesian Networks by Dynamic Programming. Technical Report, CM-CALD-05-106. Pittsburgh, USA: Carnegie Mellon University, 2005. [45] KOIVISTO M, SOOD K. Exact Bayesian Structure Discovery in Bayesian Networks. Journal of Machine Learning Research, 2004, 5: 549-573. [46] YUAN C H, MALONE B, WU X J. Learning Optimal Bayesian Networks Using A* Search // Proc of the 22nd International Joint Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2011: 2186-2191. [47] CHICKERING D M. Optimal Structure Identification with Greedy Search. Journal of Machine Learning Research, 2003, 3: 507-554. [48] ACID S, DE CAMPOS L M. Searching for Bayesian Network Structures in the Space of Restricted Acyclic Partially Directed Graphs. Journal of Artificial Intelligence Research, 2003, 18(1):445-490. [49] LARRAAGA P,KUIJPERS C M H, MURGA R H, et al. Learning Baye-sian Network Structures by Searching for the Best Ordering with Genetic Algorithms. IEEE Trans on Systems, Man, and Cyberne-tics(Systems and Humans), 1996, 26(4): 487-493. [50] LARRAAGA P, POZA M, YURRAMENDI Y, et al. Structure Learning of Bayesian Networks by Genetic Algorithms: A Performance Analysis of Control Parameters. IEEE Trans on Pattern Analysis and Machine Intelligence, 1996, 18(9): 912-926. [51] JAAKKOLA T, SONTAG D, GLOBERSON A, et al. Learning Bayesian Network Structure Using LP Relaxations[J/OL]. [2016-01-12]. http://cs.nyu.edu/~dsontag/papers/structure_aistats10.pdf. [52] CHICKERING D M, GEIGER D, HECKERMAN D .Learning Bayesian Networks: Search Methods and Experimental Results [C/OL]. [2016-01-30]. http: // research.microsoft.com/en-us/um/people/dmax/publications/aistats95.pdf. [53] CHEN X W, ANANTHA G, LIN X T. Improving Bayesian Network Structure Learning with Mutual Information-Based Node Ordering in the K2 Algorithm. IEEE Trans on Knowledge and Data Engineering, 2008, 20(5): 628-640. [54] SCHMIDT M, NICULESCU-MIZIL A, MURPHY K. Learning Gr-aphical Model Structure Using L1-Regularization Paths // Proc of the 22nd National Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2007, II: 1278-1283. [55] TEYSSIER M, KOLLER D. Ordering-Based Search: A Simple and Effective Algorithm for Learning Bayesian Networks // Proc of the 21st Conference on Uncertainty in Artificial Intelligence. San Francisco, USA: Morgan Kaufmann Publishers, 2005: 584-590. [56] XIANG J, KIM S.A* Lasso for Learning a Sparse Bayesian Network Structure for Continuous Variables // BURGES C J C, BOTTOU L, WELLING M, eds. Advances in Neural Information Processing Systems 26. Cambridge, USA: MIT Press, 2013: 2418-2426. [57] SCHMIDT M, MURHY K. LassoOrderSearch: Learning Directed Graphical Model Structure Using L1-Penalized Regression and Order Search[J/OL]. [2016-01-12]. http://research.ihost.com/cws2006/abstracts/lassoDAG.pdf. [58] BANERJEE O, EL GHAOUI L, D′ASPREMONT A. Model Selection through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data. Journal of Machine Learning Research, 2008, 9: 485-516. [59] FRIEDMAN J, HASTIE T, TIBSHIRANI R. Sparse Inverse Cova-riance Estimation with the Graphical Lasso. Biostatistics, 2008, 9(3): 432-441. [60] ROTHMAN A J, BICKEL P J, LEVINA E, et al. Sparse Permutation Invariant Covariance Estimation. Electronic Journal of Statistics, 2008, 2: 494-515. [61] COOPER G F, YOO C. Causal Discovery from a Mixture of Expe-rimental and Observational Data // Proc of the 15th Conference on Uncertainty in Artificial Intelligence. San Francisco, USA: Morgan Kaufmann Publishers, 1999: 116-125. [62] ELLIS B, WONG W H. Learning Causal Bayesian Network Structures from Experimental Data. Journal of the American Statistical Association, 2008, 103(482): 778-789. [63] FU F, ZHOU Q. Learning Sparse Causal Gaussian Networks with Experimental Intervention: Regularization and Coordinate Descent. Journal of the American Statistical Association, 2013, 108(501): 288-300. [64] FU F, GU J, ZHOU Q. Adaptive Penalized Estimation of Directed Acyclic Graphs From Categorical Data[J/OL]. [2016-01-12]. http://120.52.73.78/arxiv.org/pdf/1403.2310.pdf. [65] MEIER L, VAN DE GEER S, B HLMANN P. The Group Lasso for Logistic Regression. Journal of the Royal Statistical Society(Statistical Methodology), 2008, 70(1): 53-71. [66] HAUSER A, B HLMANN P. Jointly Interventional and Observational Data: Estimation of Interventional Markov Equivalence Classes of Directed Acyclic Graphs. Journal of the Royal Statistical Society (Statistical Methodology), 2015, 77(1): 291-318. [67] ZOU H. The Adaptive Lasso and Its Oracle Properties. Journal of the American Statistical Association, 2006, 101(476): 1418-1429. [68] FAN J Q, LIi R Z. Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties. Journal of the American Statistical Association, 2001, 96(456): 1348-1360. [69] ZHANG C H. Nearly Unbiased Variable Selection under Minimax Concave Penalty. The Annals of Statistics, 2010, 38(2): 894-942. [70] FRIEDMAN J, HASTIE T, TIBSHIRANI R. Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 2010, 33(1): 1-22. [71] PEARL J. Causality: Models, Reasoning, and Inference. Cambridge, USA: Cambridge University Press, 2000. [72] KOLLER D, FRIEDMAN N. Probabilistic Graphical Models: Principles and Techniques. Cambridge, USA: MIT Press, 2009. [73] LUUS R, WYRWICZ R. Use of Penalty Functions in Direct Search Optimization. Hungarian Journal of Industrial Chemistry, 1996, 24(4): 273-278. [74] YANG J, LEUNG H C M, YIU S M, et al. Learning Sparse Gaussian Bayesian Network Structure by Variable Grouping // Proc of the IEEE International Conference on Data Mining. Washington, USA: IEEE, 2014: 1073-1078. [75] EFRON B, HASTIE T, JOHNSTONE I, et al. Least Angle Regression. The Annuals of Statistics, 2004, 32(2): 407-499. [76] LEE S I, LEE H K, ABBEEL P, et al. Efficient L1 Regularized Logistic Regression // Proc of the 21st National Conference on Artificial intelligence. Palo Alto, USA: AAAI Press, 2006, I: 401-408. [77] FU W J. Penalized Regressions: The Bridge versus the Lasso. Journal of Computational and Graphical Statistics, 1998, 7(3): 397-416. [78] BERTSEKAS D P. Nonlinear Programming. 2nd Edition. Belmont, USA: Athena Scientific, 1999. [79] FRIEDMAN J, HASTIE T, HOFLING H, et al. Pathwise Coordinate Optimization. The Annuals of Applied Statistics, 2007, 1(2): 302-332. [80] WU T T, LANGE K. Coordinate Descent Algorithms for Lasso Penalized Regression. The Annuals of Applied Statistics, 2008, 2(1): 224-244. [81] BEZDEK J C, HATHAWAY R J. Convergence of Alternating Optimization. Neural, Parallel & Scientific Computations, 2003, 11(4): 351-368. [82] LEE H K, BATTLE A, RAINA R, et al. Efficient Sparse Coding Algorithms // SCH LKOPF B, PLATT J C, HOFFMAN T, eds. Advances in Neural Information Processing Systems 19. Cambridge, USA: MIT Press, 2006: 801-808. [83] BOYD S, PARIKH N, CHU E, et al. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Foundations and Trends in Machine Learning, 2011, 3(1): 1-122. [84] DUCHI J, SHALEV-SHWARTZ S, SINGER Y, et al. Efficient Projections onto the L1-Ball for Learning in High Dimensions // Proc of the 25th International Conference on Machine Learning. New York, USA: ACM, 2008: 272-279. [85] DE CAMPOS C P, JI Q. Efficient Structure Learning of Bayesian Networks Using Constraints. Journal of Machine Learning Research, 2011, 12: 663-689. [86] SILANDER T, MYLLYMAKI P. A Simple Approach for Finding the Globally Optimal Bayesian Network Structure // Proc of the 22nd Conference on Uncertainty in Artificial Intelligence. San Francisco, USA: Morgan Kaufmann Publishers, 2006: 445-452. [87] PARVIAINEN P, KOIVISTO M. Exact Structure Discovery in Bay- esian Networks with Less Space // Proc of the 25th Conference on Uncertainty in Artificial Intelligence. San Francisco, USA: Morgan Kaufmann Publishers, 2009: 436-443. [88] RUSSELL S, NORVIG P. Artificial Intelligence: A Modern Approach. 3rd Edition. Upper Saddle River, USA: Pearson Education, 2010. [89] XUE L Z, ZOU H. Regularized Rank-Based Estimation of High-Dimensional Nonparanormal Graphical Models. The Annals of Statistics, 2012, 40(5): 2541-2571. [90] 刘建伟,崔立鹏,罗雄麟.概率图模型的稀疏化学习综述.计算机学报, 2016, 39(8): 1597-1611. (LIU J W, CUI L P, LUO X L. Survey on the Sparse Learning of Probabilistic Graphical Models. Chinese Journal of Computers, 2016, 39(8): 1597-1611.)